NCR Docket No. 11092 

AMENDMENTS TO THE CLAIMS 

1 . (currently amended) A method implemented in a computer system, for clustering a 
string, the string including a plurality of characters, the method including: 

identifying R unique n-grams Ti...Rin the string; 
for every unique n-gram Ts: 

if the frequency of T s in a set of n-gram statistics is not greater than a first 
threshold: 

associating clustering the string with a cluster associated with T s ; 
otherwise: 

for every other n-gram Ty in the string Ti...r, except s- 

if the frequency of n-gram Tv is greater than the first threshold: 

if the frequency of an n-gram pair T s -T v is not greater than a 
second threshold: 
associating clustering the string with a cluster associated with 
the n-gram pair Ts-Tv; 

otherwise: 

for every other n-gram Tx in the string Ti...r, except s and v* 

associating clustering the string with a cluster 
associated with an n-gram triple Ts-Tv-T X; 

otherwise: 
do nothing. 

2. (original) The method of claim 1 further including compiling n-gram statistics. 

3. (original) The method of claim 1 further including compiling n-gram pair statistics. 

4. (currently amended) A method implemented in a computer system, for clustering a 
plurality of strings, each string including a plurality of characters, the method including: 

identifying unique n-grams in each string; 

associating clustering each string with clusters associated with low frequency n- 
grams from that string; and 
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associating clustering each string with clusters associated with low-frequency 
pairs of high frequency n-grams from that string. 

5. (original) The method of claim 4 further including: 

where a string does not include any low-frequency pairs of high frequency n- 
grams, associating that string with clusters associated with triples of n- 
grams including the pair. 

6. (currently amended) A method implemented in a computer system, for clustering a 
string, the string including a plurality of characters, the method including: 

identifying R unique n-grams Ti... R in the string; 
for every unique n-gram T s : 

if the frequency of T s in a set of n-gram statistics is not greater than a first 
threshold: 

associating clustering the string with a cluster associated with T s ; 
otherwise: 

fori = 1 to Y: 

for every unique set of i n-grams Tu in the string Ti ...r, except s" 

if the frequency of the n-gram set T s -Tu is not greater than a 
second threshold: 

associating clustering the string with a cluster associated 
with the n-gram set Ts-Tu; 
if the string has not been associated with a cluster with this value of Ts: 
for every unique set of Y+l n-grams Tuy in the string Ti.. r, exC ept s" 

associating clustering the string with a cluster associated with 
the Y+2 n-gram group T s -Tuy, 
where Ti... R is a set of n-grams, R is the number of elements in Ti... R , T s and 
Tu are members of Ti .. R , Tuy is a subset of T L .. R and i and Y are 
integers. 
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7. (original) The method of claim 6 where Y = 1 . 

8. (original) The method of claim 6 further including compiling n-gram statistics. 

9. (original) The method of claim 6 further including compiling n-gram group statistics. 

10. (currently amended) A computer program, stored on a tangible storage medium, for 
use in clustering a string, the program including executable instructions that cause a 
computer to: 

identify R unique n-grams Ti. Rin the string; 
for every unique n-gram T s : 

if the frequency of T s in a set of n-gram statistics is not greater than a first 
threshold: 

associat e cluster the string with a cluster associated with Ts; 
otherwise: 

for every other n-gram T v in the string Ti...r, exce pt s: 
if the frequency of n-gram T v is greater than the first threshold: 

if the frequency of an n-gram pair Ts-Tv is not greater than a 
second threshold: 
associat e cluster the string with a cluster associated with the 
n-gram pair Ts-Tv; 

otherwise 

for every other n-gram Tx in the string Ti...r, except s and v: 

associat e cluster the string with a cluster associated 
with an n-gram triple Ts-Ty-Tx; 

otherwise: 

do nothing. 
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11. (original) The computer program of claim 10 further including executable 
instructions that cause a computer to compile n-gram statistics. 

12. (original) The computer program of claim 10 further including executable 
instructions that cause a computer to compile n-gram pair statistics. 



